Search CORE

1 research outputs found

The Discrete Acyclic Digraph Markov Model in Data Mining

Author: Castelo Valdueza J.R.
Publication venue
Publication date: 03/06/2002
Field of study

Graphical Markov models are a powerful tool for the description of complex interactions between the variables of a domain. They provide a succinct description of the joint distribution of the variables. This feature has led to the most successful application of graphical Markov models, that is as the core component of probabilistic expert systems. The fascinating theory behind this type of models arises from three different disciplines, viz., Statistics, Graph Theory and Artificial Intelligence. This interdisciplinary origin has given rich insight from different perspectives. There are two main ways to find the qualitative structure of graphical Markov models. Either the structure is specified by a domain expert or ``structural learning'' is applied, i.e., the structure is automatically recovered from data. For structural learning, one has to compare how well different models describe the data. This is easy for, e.g., acyclic digraph Markov models. However, structural learning is still a hard problem because the number of possible models grows exponentially with the number of variables. The main contributions of this thesis are as follows. Firstly, a new class of graphical Markov models, called TCI models, is introduced. These models can be represented by labeled trees and form the intersection of two previously well-known classes. Secondly, the inclusion order of graphical Markov models is studied. From this study, two new learning algorithms are derived. One for heuristic search and the other for the Markov Chain Monte Carlo Method. Both algorithms improve the results of previous approaches without compromising the computational cost of the learning process. Finally, new diagnostics for convergence assessment of the Markov Chain Monte Carlo Method in structural learning are introduced. The results of this thesis are illustrated using both synthetic and real world datasets

Utrecht University Repository